{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "参考链接：https://github.com/dmmiller612/bert-extractive-summarizer"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "抽取式摘要\n",
    "- 首先对句子编码\n",
    "- 然后聚类算法，找到最接近中心的句子\n",
    "- 然后共指解析，找到句子中代词的含义"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2020-06-04T07:30:54.162904Z",
     "start_time": "2020-06-04T07:30:54.157517Z"
    }
   },
   "source": [
    "# BertSum"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "原始连接：https://github.com/nlpyang/BertSum    \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](../images/bertSum.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- 输入处理：\n",
    "    + 在每个句子的句首添加 `[CLS]` 句尾添加 `[SEP]` 标记；\n",
    "    + 多个句子交替使用句子编码 $E_A$ 和 $E_B$ 表征句子位置\n",
    "    + 位置编码与原始 BERT 模型相同\n",
    "- 每个 `[CLS]` 对应的输出 $T_1$,$T_2$,$T_3$ 作为每个句子的特征向量：\n",
    "    + 接`sigmoid`线性分类层，计算每个句子作为摘要的概率\n",
    "    + 或者，再接Transformer编码层，其中 $h^0$ 由句子位置编码$E_A,E_B$加上$T_1,T_2...$得到，$l$ 表示层的深度，$\\text{LN}$标识 layer  normalization 。最后模型再接`sigmoid`线性分类层\n",
    "        $$\n",
    "        \\hat{h}^l = \\text{LN}(h^{l-1}+\\text{MHAtt}(h^{l-1}) )      \\\\    \n",
    "        h^l = \\text{LN}(\\hat{h}^{l}+\\text{FFN}(\\hat{h}^{l}) ) \n",
    "        $$\n",
    "    + 或者再接 LSTM 层，然后`sigmoid`分类器\n",
    "- Trigram Blocking 方法，删除可能造成冗余的摘要句\n",
    "    - 给定已经抽取出来的部分摘要 S（多个句子组成），和候选句子，如果候选句子和 S 存在一个重合的语块（由三个单词构成），就忽略该句子。    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.7"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "position": {
    "height": "775px",
    "left": "986px",
    "right": "20px",
    "top": "129px",
    "width": "791px"
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
